Conversation
388cc0e to
e50a263
Compare
There was a problem hiding this comment.
Pull request overview
Adds “instance group” introspection to the GraphQL API so clients can inspect ReplicaSet-backed rollout revisions, including env vars, mounted files, instances (pods), and recent K8s events.
Changes:
- Introduces
instancegroupworkload package (models, queries, event translation, dataloader + ReplicaSet watcher). - Extends GraphQL schema and resolvers to expose
Application.instanceGroupsandInstanceGroupdetails. - Adjusts Pod transformer to retain
metadata.ownerReferencesso pods can be matched to ReplicaSets.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/workload/instancegroup/queries.go | Core listing + introspection logic for env vars, mounts, and instance (pod) membership |
| internal/workload/instancegroup/events.go | Fetches and translates K8s events for ReplicaSets/pods into user-friendly messages |
| internal/workload/instancegroup/models.go | Defines InstanceGroup and related GraphQL-facing models/enums |
| internal/workload/instancegroup/dataloader.go | Adds loaders + ReplicaSet transformer + environment->dynamic client lookup |
| internal/workload/instancegroup/node.go | Registers InstanceGroup as a Node ident type |
| internal/workload/dataloader.go | Keeps Pod ownerReferences in cached pods to enable ReplicaSet ownership matching |
| internal/kubernetes/watchers/watchers.go | Wires a new ReplicaSet watcher into the global watcher set |
| internal/graph/schema/instancegroup.graphqls | Adds GraphQL types/fields for instance groups and related types |
| internal/graph/instancegroup.resolvers.go | Implements resolvers for instanceGroups, env vars, mounts, instances, events |
| internal/cmd/api/http.go | Injects instancegroup loader context into Graph request pipeline |
| .configs/gqlgen.yaml | Adds instancegroup package to gqlgen autobind |
| internal/graph/gengql/applications.generated.go | gqlgen output wiring Application.instanceGroups |
| internal/graph/gengql/instancegroup.generated.go | gqlgen output for InstanceGroup types/resolvers |
| internal/graph/gengql/schema.generated.go | gqlgen output: Node type switch + schema inclusion |
| internal/graph/gengql/root_.generated.go | gqlgen output: resolver root + complexity hooks |
| internal/graph/gengql/teams.generated.go | gqlgen output: Application field context updates |
| internal/graph/gengql/issues.generated.go | gqlgen output: Application field context updates |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case env.ValueFrom != nil && env.ValueFrom.ConfigMapKeyRef != nil: | ||
| value := env.Value // will be empty for valueFrom, but we try | ||
| ev.Value = &value | ||
| ev.Source = InstanceGroupValueSource{ | ||
| Kind: InstanceGroupValueSourceKindConfigMap, | ||
| Name: env.ValueFrom.ConfigMapKeyRef.LocalObjectReference.Name, | ||
| } | ||
| case env.ValueFrom != nil && env.ValueFrom.FieldRef != nil: |
There was a problem hiding this comment.
For env vars using ValueFrom.ConfigMapKeyRef, this sets Value to env.Value (which will be empty) and never resolves the referenced key from the ConfigMap. This contradicts the function doc that says ConfigMap variables include values. Suggest querying the ConfigMap and reading ConfigMapKeyRef.Key (honor Optional), or set Value=nil and make the doc accurate.
There was a problem hiding this comment.
Fixed in 64d1349 — ConfigMapKeyRef env vars now resolve their value via getConfigMapData() (K8s API call). Honors the Optional flag (debug log for optional, warn for required). Value is nil if the lookup fails.
| if mount.SubPath != "" { | ||
| // Fetch content for the specific key | ||
| contents, err := getConfigMapFileContents(ctx, l, ig.EnvironmentName, ig.TeamSlug.String(), cmName) | ||
| if err != nil { | ||
| l.log.WithError(err).WithField("configmap", cmName).Warn("failed to resolve configmap content for subPath mount") | ||
| errMsg := fmt.Sprintf("ConfigMap '%s' could not be found or accessed. The application may fail to start until this is resolved.", cmName) | ||
| return []*InstanceGroupMountedFile{{ | ||
| Path: mount.MountPath, | ||
| Source: source, | ||
| Error: &errMsg, | ||
| }} | ||
| } | ||
| f := &InstanceGroupMountedFile{ | ||
| Path: mount.MountPath, | ||
| Source: source, | ||
| } | ||
| if fc, ok := contents[mount.SubPath]; ok { | ||
| f.Content = &fc.content | ||
| f.IsBinary = fc.isBinary | ||
| } |
There was a problem hiding this comment.
ConfigMap volume mounts with VolumeMount.SubPath are resolved using contents[mount.SubPath], but SubPath refers to the file path within the volume, not necessarily the ConfigMap key (it may match items[].Path when Items are specified). This can silently drop content for subPath mounts. Suggest mapping SubPath -> Key via the provided items (when present), otherwise fall back to using SubPath as the key.
There was a problem hiding this comment.
Fixed in 64d1349 — when SubPath is set and items are specified, we now map SubPath back to the actual ConfigMap key via the items list (item.Path → item.Key). Without items, SubPath is used directly as the key (unchanged).
|
|
||
| // Fall back to metadata.creationTimestamp | ||
| if ts, ok, _ := unstructured.NestedString(obj.Object, "metadata", "creationTimestamp"); ok && ts != "" { | ||
| if t, err := time.Parse(time.RFC3339, ts); err == nil { |
There was a problem hiding this comment.
metadata.creationTimestamp is parsed with time.RFC3339, but Kubernetes timestamps often include fractional seconds. time.Parse(time.RFC3339, ts) will fail in that case, causing extractEventTimestamp to return zero and dropping events. Suggest parsing with time.RFC3339Nano (which accepts both with/without fractions) or trying both formats.
| if t, err := time.Parse(time.RFC3339, ts); err == nil { | |
| if t, err := time.Parse(time.RFC3339Nano, ts); err == nil { |
There was a problem hiding this comment.
Fixed in 64d1349 — changed to time.RFC3339Nano which accepts both with and without fractional seconds.
| // Fetch events for each pod in the instance group | ||
| instances, err := ListInstances(ctx, ig) | ||
| if err != nil { | ||
| l.log.WithError(err).Warn("failed to list instances for event fetching") | ||
| } else { | ||
| for _, inst := range instances { | ||
| podEvents, err := fetchEvents(ctx, client, namespace, "Pod", inst.Name) | ||
| if err != nil { | ||
| l.log.WithError(err).WithField("pod", inst.Name).Warn("failed to fetch pod events") | ||
| continue | ||
| } | ||
| podName := inst.Name | ||
| for i := range podEvents { | ||
| if ev := translateEvent(&podEvents[i], &podName); ev != nil { | ||
| allEvents = append(allEvents, ev) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
ListEvents issues one Kubernetes List call per pod instance (plus one for the ReplicaSet). For workloads with many replicas this becomes N+1 API calls and can add significant latency/load to the API server when the GraphQL field is queried. Consider adding server-side limits (Limit/Continue, time window), batching via a single namespace event list + in-memory filtering, and/or caching events per request.
There was a problem hiding this comment.
Acknowledged — for typical workloads (2-5 pods) this is acceptable. Events are fetched on-demand and K8s events have ~1 hour TTL. Can be optimized later with namespace-scoped list + in-memory filtering if needed.
| // transformReplicaSet strips unnecessary fields from ReplicaSets but keeps | ||
| // the data we need: labels, annotations (revision), replicas, pod template | ||
| // (containers with env/envFrom/volumeMounts/image, volumes). | ||
| func transformReplicaSet(in any) (any, error) { | ||
| rs := in.(*unstructured.Unstructured) | ||
|
|
||
| // --- Extract data we want to keep --- | ||
|
|
||
| // metadata.annotations - keep only deployment.kubernetes.io/revision | ||
| annotations := rs.GetAnnotations() | ||
| newAnnotations := map[string]string{} | ||
| if v, ok := annotations["deployment.kubernetes.io/revision"]; ok { | ||
| newAnnotations["deployment.kubernetes.io/revision"] = v | ||
| } | ||
|
|
||
| // metadata.labels - keep app and team | ||
| labelsToKeep := []string{"app", "team"} | ||
| labels := rs.GetLabels() | ||
| for k := range labels { | ||
| if !slices.Contains(labelsToKeep, k) { | ||
| delete(labels, k) | ||
| } | ||
| } | ||
|
|
||
| // spec.replicas | ||
| replicas, _, _ := unstructured.NestedInt64(rs.Object, "spec", "replicas") | ||
|
|
||
| // spec.template.spec.containers (keep name, image, env, envFrom, volumeMounts) | ||
| containers, _, _ := unstructured.NestedSlice(rs.Object, "spec", "template", "spec", "containers") | ||
| newContainers := make([]any, 0, len(containers)) | ||
| for _, container := range containers { | ||
| c, ok := container.(map[string]any) | ||
| if !ok { | ||
| return nil, fmt.Errorf("container is not a map[string]any") | ||
| } | ||
| newC := map[string]any{ | ||
| "name": c["name"], | ||
| } | ||
| if img, ok := c["image"]; ok { | ||
| newC["image"] = img | ||
| } | ||
| if env, ok := c["env"]; ok { | ||
| newC["env"] = env | ||
| } | ||
| if envFrom, ok := c["envFrom"]; ok { | ||
| newC["envFrom"] = envFrom | ||
| } | ||
| if vm, ok := c["volumeMounts"]; ok { | ||
| newC["volumeMounts"] = vm | ||
| } | ||
| newContainers = append(newContainers, newC) | ||
| } | ||
|
|
||
| // spec.template.spec.volumes | ||
| volumes, _, _ := unstructured.NestedSlice(rs.Object, "spec", "template", "spec", "volumes") | ||
|
|
||
| // status.replicas and status.readyReplicas | ||
| statusReplicas, _, _ := unstructured.NestedInt64(rs.Object, "status", "replicas") | ||
| readyReplicas, _, _ := unstructured.NestedInt64(rs.Object, "status", "readyReplicas") | ||
|
|
||
| // spec.template.metadata.labels (for pod matching) | ||
| templateLabels, _, _ := unstructured.NestedStringMap(rs.Object, "spec", "template", "metadata", "labels") | ||
|
|
||
| // metadata.ownerReferences (to link to Deployment/Application) | ||
| ownerRefs, _, _ := unstructured.NestedSlice(rs.Object, "metadata", "ownerReferences") | ||
|
|
||
| // --- Remove everything --- | ||
| fieldsToRemove := [][]string{ | ||
| {"spec"}, | ||
| {"status"}, | ||
| {"metadata", "managedFields"}, | ||
| {"metadata", "generateName"}, | ||
| } | ||
| for _, field := range fieldsToRemove { | ||
| unstructured.RemoveNestedField(rs.Object, field...) | ||
| } | ||
|
|
||
| // --- Add back only what we need --- | ||
| rs.SetLabels(labels) | ||
| rs.SetAnnotations(newAnnotations) | ||
|
|
||
| _ = unstructured.SetNestedField(rs.Object, replicas, "spec", "replicas") | ||
| _ = unstructured.SetNestedSlice(rs.Object, newContainers, "spec", "template", "spec", "containers") | ||
| if len(volumes) > 0 { | ||
| _ = unstructured.SetNestedSlice(rs.Object, volumes, "spec", "template", "spec", "volumes") | ||
| } | ||
| if len(templateLabels) > 0 { | ||
| templateLabelsAny := make(map[string]any, len(templateLabels)) | ||
| for k, v := range templateLabels { | ||
| templateLabelsAny[k] = v | ||
| } | ||
| _ = unstructured.SetNestedMap(rs.Object, templateLabelsAny, "spec", "template", "metadata", "labels") | ||
| } | ||
| if len(ownerRefs) > 0 { | ||
| _ = unstructured.SetNestedSlice(rs.Object, ownerRefs, "metadata", "ownerReferences") | ||
| } | ||
|
|
||
| _ = unstructured.SetNestedField(rs.Object, statusReplicas, "status", "replicas") | ||
| _ = unstructured.SetNestedField(rs.Object, readyReplicas, "status", "readyReplicas") | ||
|
|
||
| return rs, nil | ||
| } |
There was a problem hiding this comment.
This new transformer is fairly involved and is expected to run in informer WatchList paths where transforms can be applied more than once. There are tests for transformer idempotency in other workload packages (e.g. secret). Consider adding unit tests verifying transformReplicaSet is idempotent and preserves the fields needed by queries (revision, replicas, containers env/envFrom/volumeMounts, volumes, ownerRefs).
There was a problem hiding this comment.
Acknowledged — transformer tests are on the backlog. Will add idempotency tests in a follow-up.
… value resolution
Lua integration tests covering: - Application with instance groups (ReplicaSet filtering) - Instance group with instances (Pod listing) - Instance group with environment variables (env, configMapKeyRef, secretKeyRef, envFrom) - Instance group with mounted files (ConfigMap/Secret volume mounts) Also includes dataloader transformer idempotency unit test.
Include image name, container name, pod name, and rescale size in translated event messages instead of dropping them. Also remove nais.yaml tips from scheduling failure messages.
8519e42 to
a4382d4
Compare
…ss, and last termination info Extend ApplicationInstanceStatus with ready, lastExitReason, and lastExitCode fields to give users actionable insight into instance health. Replace raw Kubernetes reason strings with human-readable messages. Add TERMINATED state for containers that exited successfully. Fix State() to use classifyWaiting so benign startup states no longer incorrectly report Application as NOT_RUNNING.
…atterns
- Replace benignWaiting map[string]struct{} with slices.Contains
- Separate benign and problem message maps, remove redundant switch
- Replace sort.Slice with slices.SortFunc using time.Compare
- Replace sort.Strings with slices.Sorted(maps.Keys(...))
- Replace IsValid() switch statements with slices.Contains
No description provided.